docs(coding-agent/edit): document Unicode escape semantics in edit tool prompts by apoc · Pull Request #891 · can1357/oh-my-pi

apoc · 2026-04-30T10:15:00Z

Background

This replaces #889, which proposed runtime decoding of \uXXXX escape sequences in the edit/write pipeline. The Codex review on that PR (inline comment on normalize.ts:237) correctly identified that runtime decoding is unsound: it makes it impossible to write a single literal 6-char \u2192 source-code sequence to disk, because every viable JSON tool argument either decodes to the character or ends up with extra backslashes.

LLM JSON tool arg	Parsed string	After runtime decode	On disk
`"\\u2192"` (natural)	`\u2192` (6 chars)	`→`	`→` ❌
`"\\\\u2192"` (escape attempt)	`\\u2192` (7 chars)	unchanged by lookbehind	`\\u2192` ❌

That breaks JS regex source (/\u2192/), Python raw strings (r"\u2192"), JSON fixtures, and any code that contains \uXXXX literally.

Approach

Document the convention in tool prompts instead of decoding at runtime. JSON natively decodes \uXXXX already, so:

For the character → — emit "\u2192" (one backslash) in the JSON, or the literal → character. Both arrive at the tool as →.
For the literal 6-char \u2192 (regex source, raw strings, fixtures) — emit "\\u2192" (two backslashes) so the JSON parser delivers the 6 chars verbatim.
NEVER emit "\\u2192" (two backslashes) when you intend the character. That writes literal text, not a Unicode character.

Add a <unicode-content> section to four tool prompts spelling out both directions and the negative example:

write.md (content)
replace.md (old_text / new_text)
patch.md (diff — + lines and op:create payloads)
hashline.md (content)

Files

packages/coding-agent/CHANGELOG.md                  | 3 +++
packages/coding-agent/src/prompts/tools/hashline.md | 7 +++++++
packages/coding-agent/src/prompts/tools/patch.md    | 7 +++++++
packages/coding-agent/src/prompts/tools/replace.md  | 7 +++++++
packages/coding-agent/src/prompts/tools/write.md    | 7 +++++++
5 files changed, 31 insertions(+)

Why this is better than runtime decoding

No silent corruption of valid source code. /\u2192/ written to a .ts file stays as 6 chars; the tool's job is to write what JSON delivered, not to second-guess it.
No new failure mode. Runtime decoding meant LLMs that already emitted characters correctly would now have a hidden transformation between their intent and the file.
Teaches the convention. Anthropic-style tool descriptions are read attentively by the LLM. A clear "do this, not that" instruction trains future calls; a runtime decoder hides the rule.
Reversible. If experience shows LLMs ignore the guidance and persistently double-escape, we can revisit. A documentation change is cheap to evolve.

Verification

bun run format-prompts clean (no formatting changes needed)
All four prompts use the actual U+2192 character in prose where the character is intended, and the literal 6-char escape sequence in code spans where the literal text is intended (verified via xxd)

…ol prompts Tool-call JSON is parsed before any of the file-writing tools sees its content arguments, so JSON's native `\uXXXX` decoding already covers the "write a Unicode character" case: emitting `"\u2192"` (one backslash) in the JSON delivers `→` to the tool. To write the *literal* 6-char source sequence `\u2192` (e.g. JS regex `/\u2192/`, Python `r"\u2192"`, JSON fixtures, docs about Unicode), emit `"\\u2192"` (two backslashes) so the JSON parser delivers the 6 chars verbatim. Add a `<unicode-content>` section to the `write`, `replace`, `patch`, and `hashline` tool prompts spelling out both directions and an explicit "never emit two backslashes when you intend the character" rule. This prevents the common LLM mistake of double-escaping (which produces the literal text on disk) without requiring runtime decoding heuristics that would make literal-escape writing impossible.

can1357 · 2026-04-30T12:00:44Z

This is a bit wasteful in terms of prompt space tbh, although I'm aware of the issue.
I wonder, how CC deals with it? I don't remember seeing this in their prompt.

apoc · 2026-04-30T12:43:09Z

Agree, I tried to fix it in tooling, but it had another issues :/

apoc · 2026-04-30T12:43:42Z

#889

apoc · 2026-05-02T08:12:52Z

This is a bit wasteful in terms of prompt space tbh, although I'm aware of the issue. I wonder, how CC deals with it? I don't remember seeing this in their prompt.

Checked the CC code and it looks like there is not special care neither for this kind of edits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(coding-agent/edit): document Unicode escape semantics in edit tool prompts#891

docs(coding-agent/edit): document Unicode escape semantics in edit tool prompts#891
apoc wants to merge 1 commit intocan1357:mainfrom
apoc:docs/edit-tools-unicode-escape-guidance

apoc commented Apr 30, 2026

Uh oh!

can1357 commented Apr 30, 2026

Uh oh!

apoc commented Apr 30, 2026

Uh oh!

apoc commented Apr 30, 2026

Uh oh!

apoc commented May 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

apoc commented Apr 30, 2026

Background

Approach

Files

Why this is better than runtime decoding

Verification

Uh oh!

can1357 commented Apr 30, 2026

Uh oh!

apoc commented Apr 30, 2026

Uh oh!

apoc commented Apr 30, 2026

Uh oh!

apoc commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

apoc commented May 2, 2026 •

edited

Loading